Creating a medical English-Swedish dictionary using interactive word alignment

نویسندگان

  • Mikael Nyström
  • Magnus Merkel
  • Lars Ahrenberg
  • Pierre Zweigenbaum
  • Håkan Petersson
  • Hans Åhlfeldt
چکیده

BACKGROUND This paper reports on a parallel collection of rubrics from the medical terminology systems ICD-10, ICF, MeSH, NCSP and KSH97-P and its use for semi-automatic creation of an English-Swedish dictionary of medical terminology. The methods presented are relevant for many other West European language pairs than English-Swedish. METHODS The medical terminology systems were collected in electronic format in both English and Swedish and the rubrics were extracted in parallel language pairs. Initially, interactive word alignment was used to create training data from a sample. Then the training data were utilised in automatic word alignment in order to generate candidate term pairs. The last step was manual verification of the term pair candidates. RESULTS A dictionary of 31,000 verified entries has been created in less than three man weeks, thus with considerably less time and effort needed compared to a manual approach, and without compromising quality. As a side effect of our work we found 40 different translation problems in the terminology systems and these results indicate the power of the method for finding inconsistencies in terminology translations. We also report on some factors that may contribute to making the process of dictionary creation with similar tools even more expedient. Finally, the contribution is discussed in relation to other ongoing efforts in constructing medical lexicons for non-English languages. CONCLUSION In three man weeks we were able to produce a medical English-Swedish dictionary consisting of 31,000 entries and also found hidden translation errors in the utilized medical terminology systems.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating a medical dictionary using word alignment: The influence of sources and resources

BACKGROUND Automatic word alignment of parallel texts with the same content in different languages is among other things used to generate dictionaries for new translations. The quality of the generated word alignment depends on the quality of the input resources. In this paper we report on automatic word alignment of the English and Swedish versions of the medical terminology systems ICD-10, IC...

متن کامل

Creating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction

This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually trans...

متن کامل

The Impact of Lemmatization in Word Alignment

The focus of this thesis is on examining whether word alignment results can be improved in precision and recall through lemmatization, and extraction of lemma dictionaries from the resulting links. Lemmas are extracted from existing lexical resources in order to replace word forms in two parallel corpora documents, one featuring the language pair English-Swedish and the other the language pair ...

متن کامل

Creating a free digital Japanese-Swedish lexicon

This paper describes the creation of a Japanese-Swedish dictionary. The well known technique of using English as an intermediate language was used. A few modifications of this method are presented, such as weighting words with an idf-like measure and allowing one Japanese word to be described by several Swedish words when there is no directly corresponding translation available. For about 60,00...

متن کامل

Creating a Bilingual Psychology Lexicon for Cross Lingual Question Answering - A Pilot Study

This paper introduces a pilot study aimed at investigating the extraction of word relations from a sample of a medical parallel corpus in the field of Psychology. Word relations are extracted in order to create a bilingual lexicon for cross lingual question answering between Swedish and English. Four different variants of the sample corpus were utilized: word inflections with and without POS ta...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • BMC Medical Informatics and Decision Making

دوره 6  شماره 

صفحات  -

تاریخ انتشار 2006